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Abstract 

The  authors  have  applied  an  advanced  set  of  auto-regressive 
tools  for  identifying  potentially  complex,  linear  and  non¬ 
linear  relationships  in  data,  wherein  the  underlying  physical 
relationships  are  not  well  described.  In  this  paper  these  tools 
and  techniques  are  described  in  detail,  and  the  results  of  the 
application  of  these  tools  to  evaluation  of  diesel  engine 
lubricating  oil  health  (based  on  electrochemical  impedance 
spectroscopy  data)  is  detailed.  It  is  demonstrated  that  highly 
accurate  models  can  be  constructed  which  take  as  input 
features  derived  from  diesel  engine  lubricating  oil 
electrochemical  impedance  spectroscopy  data  and  output 
estimates  of  traditional  laboratory  based  oil  analysis 
parameters.  The  electrochemical  impedance  spectroscopy 
and  laboratory  analytical  data  used  are  from  a  field 
deployment  of  oil  condition  sensors  on  several  long-haul 
class  8  diesel  trucks.  The  dataset  was  divided  into  training 
and  test  datasets  and  goodness  of  fit  metrics  were  calculated 
to  evaluate  model  performance.  Models  were  successfully 
generated  for  nitration,  soot  content,  total  base  number,  total 
acid  number,  and  viscosity. 

1.  Introduction 

An  on-line  oil  condition  monitoring  device  for  application 
to  vehicular  diesel  engines  provides  significant  benefit  over 
traditional  oil  sampling  methods.  The  online  nature  of  the 
monitoring  device  eliminates  the  long  delays  associated 
with  traditional  laboratory  analysis  and  prevents  the 
possibility  of  sampling  errors.  Knowledge  of  the  actual 
condition  of  the  oil  at  a  particular  time  also  allows  for  the 
real  time  adjustment  of  oil  drain  intervals  -  either  extending 
to  take  advantage  of  additional  remaining  useful  life  or 
shortening  to  prevent  engine  damage  due  to  abnormal  fluid 


conditions  or  contaminations.  Maintenance  actions  can  also 
be  planned  and  carried  out  opportunistically. 

It  has  long  been  known  that  electrochemical  impedance 
spectroscopy  (EIS)  can  provide  valuable  insight  into  the 
condition  of  lubricating  oils  and  their  additive  packages 
(Byington  et  al  2010,  Moffatt  et  al  2012).  In  order  to  mature 
this  understanding  research  within  this  field  has  focused  on 
characterizing  the  relationship  between  lubricating  oils  and 
electrochemical  impedance  spectroscopy.  Lvovich  V  F.  and 
Smiechowski  M.  F.  (2011,  2008,  2006,  2005,  2002,  2001) 
are  the  primary  contributors  to  this  characterization  and 
have  produced  several  well  behaved  models  of  the 
relationship.  While  these  models  provide  tremendous 
insight  into  lubricant  chemistry,  they  are  based  on  empirical 
data  from  laboratory  grade  instrumentation  and  known  oil 
formulations  and  contaminants.  For  on-line  lubricant 
monitors,  the  oil  formulation  and  contamination  is  unknown 
and  therefore  samples  must  be  drawn  and  traditional  oil 
analysis  performed.  These  traditional  laboratory  tests 
typically  output  lubricant  chemical  and  mechanical 
properties  such  as  Total  Acid  Number  (TAN),  Total  Base 
Number  (TBN),  percent  soot  content,  viscosity,  and  degree 
of  nitration,  among  others. 

The  work  presented  in  this  paper  extends  the  scope  of 
previous  modeling  research  by  establishing  a  direct  map 
between  on-line  oil  sensor  features  and  the  underlying  oil 
chemistry  assessed  through  traditional  laboratory  analysis. 
While  correlations  have  been  observed  between  these  on¬ 
line  EIS  data  and  those  values  which  represent  the  output  of 
traditional  laboratory  oil  analysis  (Mackos  et  al  2008), 
models  have  not  been  developed  to  explicate  this 
relationship.  While  EIS  data  alone  can  be  used  to  generate 
lubricant  remaining  useful  life  estimates,  using  models  to 
estimate  traditional  laboratory  oil  analysis  parameters 
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provides  additional  benefits;  for  example,  historical 
condemnation  thresholds  established  using  these  traditional 
oil  analysis  parameters  can  be  leveraged. 

The  physics  of  the  relationship  between  measured 
electrochemical  data  and  laboratory  test  outputs  like  TAN 
and  TBN  is  vastly  complex.  Therefore  the  explicit  definition 
of  transfer  functions  to  translate  EIS  data  into  the  desired 
laboratory  test  features  is  difficult  and  impractical.  Several 
methods  exist  for  modeling  complex  scientific  data.  When 
expert  knowledge  of  the  parametric  relationships  between 
measured  data  are  known,  fixed-form  models  can  be 
applied.  In  the  case  that  relationships  are  not  well 
understood,  numerical  models  are  often  used.  E.g  neural 
networks,  naive  bayes  classifiers.  These  models  however  do 
not  explain  discovered  relationships  intuitively  and  thus  do 
not  easily  distill  data  into  scientific  knowledge.  Instead,  the 
authors  have  pursued  the  application  of  symbolic  regression 
techniques  which  require  no  a  priori  knowledge  of  the 
functional  relationship  between  the  inputs  and  desired 
outputs  of  such  a  model  and  result  in  a  closed  form  solution 
which  may  describe  physical  and  chemical  relationships 
more  clearly. 

The  authors  have  been  working  with  US  Army  Tank 
Automotive  Research,  Development  and  Engineering 
Center  (TARDEC)  to  develop  next  generation  hardware  for 
online  oil  condition  monitoring.  As  part  of  this  effort,  and 
with  the  cooperation  of  the  National  Automotive  Center, 
several  sets  of  existing  oil  condition  monitoring  hardware 
were  deployed  on  long  haul  class  8  trucks.  A  periodic  oil 
sampling  and  laboratory  analysis  plan  was  also 
implemented.  These  laboratory  analytical  and  EIS  data 
were  used  to  evaluate  the  capability  of  symbolic  regression 
techniques  to  generate  models  for  estimating  TAN,  TBN, 
nitration,  soot  content,  and  viscosity. 

2.  Symbolic  Regression  Overview 

The  main  objective  of  this  effort  was  to  correlate  laboratory 
generated  tribology  results  with  sensor  generated 
electrochemical  impedance  spectroscopy  data.  While  there 
is  prior  understanding  of  the  chemical  and  physical  nature  of 
oil  and  how  it  interacts  with  contaminants  and  other 
breakdown  processes,  this  understanding  has  never  directly 
resulted  in  models  that  correlate  tribology  data  to  EIS  data. 
Given  the  ground  truth  information  this  is  a  supervised 
learning  problem  and  since  the  tribology  data  is  not 
discretized,  a  regression  method  is  appropriate  (rather  than 
classifier  methods  such  as  logistic  regression,  neural 
networks,  support  vector  machines,  etc.).  Multi-variant 
Linear  regression  is  the  obvious  and  standard  method;  if  the 
specific  model  for  optimization  is  known  then  symbolic 
regression  is  unnecessary.  If  however  the  model  is 
unknown,  the  application  of  linear  regression  is  a  labor 
intensive  process,  to  include  adding  and  subtracting 
features,  increasing  and  decreasing  the  complexity  of 


features  included,  cross-validation,  and  regularization.  The 
application  of  Symbolic  Regression,  and  the  toolsets  which 
were  employed,  effectively  automate  these  processes. 
Symbolic  regression  also  provides  significant  benefit  over 
linear  regression  when  the  ultimate  goal  is  to  deploy  the 
models  in  an  embedded  environment.  Like  linear  regression 
a  closed  form  equation  is  generated,  however  the  operations 
for  inclusion  in  the  solutions  identified  can  be  defined  ahead 
of  time;  in  this  manner  any  limitations  of  the  embedded 
platform  can  be  accounted  for.  Solutions  of  varying  levels 
of  complexity  can  also  be  generated  and  evaluated  to  trade 
off  performance  in  terms  of  accuracy  and  computational 
complexity. 

The  Symbolic  Regression  algorithm  described  in  this 
section  is  used  to  identify  general  and  potentially  complex 
relationships,  in  this  case  between  the  online  oil-condition 
monitor  observations  and  associated  laboratory  generated 
oil  chemical  and  mechanical  properties.  The  Symbolic 
Regression  algorithm  (Koza,  1992)  is  a  generalization  to  the 
standard  regression  problem  formulation  in  that  it  requires 
very  few  assumptions  regarding  the  underlying  regression 
model  and  the  output  of  the  algorithm  is  a  closed  form 
expression  that  can  easily  be  implemented  on  an  embedded 
platform.  The  produced  closed  form  expressions  can  be  non¬ 
linear  and  have  temporal  dependencies  and  as  a  result 
important  information  such  as  leading  fault  (temporal)  or 
cyclic  degradation  (non-linear)  can  be  identified  using  this 
technique.  In  short,  the  Symbolic  Regression  technique  is  an 
excellent  choice  when  faced  with  complex  problems  where 
many  of  the  underlying  physical  behaviors  of  a  system  are 
not  well  described. 

The  Symbolic  Regression  tool  used  for  this  analysis  relies 
on  Genetic  Programming  (Koza,  1998)  to  search  for  the  best 
functional/algebraic  map  between  the  produced  oil 
condition  monitor  features  and  the  oil  analysis  results 
reported  by  the  laboratory.  The  Genetic  Programming 
algorithm  evaluates  a  pool  of  symbolic  expressions 
represented  by  a  collection  of  parse  trees  (one  such  tree  is 
depicted  in  Figure  1)  and  iteratively  applies  candidate 
selection,  cross-over  and  mutation  operations  to  generate  the 
most  effective  expressions.  The  fitness  of  each  expression 
can  be  evaluated  using  many  different  metrics;  however,  for 
the  analysis  performed  in  this  work,  the  mean  absolute  error 
was  used.  As  with  most  data  driven  modeling  tools,  special 
attention  must  be  paid  to  avoid  over-fitting  the  derived 
model  to  the  provided  data.  For  Symbolic  Regression  the 
over-fitting  problem  is  addressed  at  two  different  levels. 
First,  the  algorithm  provides  a  Pareto  front  of  optimal 
solutions  that  allows  the  researcher  to  select  the  ideal 
solution  in  terms  of  functional  complexity  and  performance. 
For  instance,  if  a  simple  solution  performs  only  slightly 
worse  than  a  much  more  complex  expression,  the  Symbolic 
Regression  tool  will  provide  both  solutions  to  the  researcher 
who  can  then  select  the  correct  solution  in  terms  of 
complexity  and  performance.  By  providing  this 
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functionality,  it  is  possible  to  eliminate  overly  complex 
expressions  that  tend  to  be  highly  tuned  to  the  training  data. 
The  second  technique  used  to  prevent  over-fitting  is  the 
standard  cross-validation  approach.  That  is,  the  generated 
expressions  are  optimized  to  fit  a  training  set,  say  80%  of 
the  original  data,  but  when  evaluating  the  performance  of 
the  expressions  the  remaining  20%  of  the  data  is  used.  This 
simple  approach  reduces  the  likelihood  of  over-fitting. 


Functions 


Figure  1 .  An  example  of  a  parse  tree  corresponding  to  the 
expression  2+3+x*7+Y/5. 

In  addition  to  the  ease  of  implementing  the  derived 
expressions  on  an  embedded  platform,  it  is  also  possible  to 
analyze  the  individual  terms  in  each  of  the  expressions  to 
determine  what  their  impact  may  be  on  the  overall  model 
response.  This  sensitivity  analysis  step  provides  insight  into 
how  important  each  term  is,  and  also  into  what  features 
should  be  generated  by  the  oil  condition  monitor. 

It  is  worth  noting  that  the  Symbolic  Regression  analysis  is 
only  performed  during  the  development  of  the  oil 
assessment  model.  That  is,  the  symbolic  regression  process 
will  not  be  running  on  the  sensor  itself;  only  the  functional 
output  of  the  symbolic  regression  process  would  be 
considered  for  embedded  implementation.  It  should  also  be 
noted  that  Symbolic  Regression  analysis  tools  are  freely 
available  to  developers  through  the  Eureqa  software 
developed  by  a  group  of  researchers  at  Cornell  University 
(Schmidt  &  Lipson,  2009).  This  tool  is  mature  and  allows 
users  to  distribute  the  search  task  over  a  large  number  of 
computers  through  Amazon  Cloud  Services 
http://aws.amazon.com/. 

3.  Application  To  Diesel  Engine  Lubricating  Oil 

3.1.  Description  of  the  Dataset  Used 

The  underlying  technology  for  the  oil  condition  monitor 
detailed  herein  is  electrochemical  impedance  spectroscopy 
(EIS),  wherein  the  fluid  under  test  is  subjected  to  a  dynamic 
electrical  signal  and  the  fluid’s  effects  on  the  signal  are 
measured  and  correlated  to  various  chemical  and  physical 
phenomena.  The  oil  condition  monitor’s  embedded 
algorithm  trends  temperature-normalized  and  filtered 


electrochemical  impedances  measured  at  a  high  frequency 
(HF),  medium  frequency  (MF),  and  low  frequency  (LF). 

As  previously  described,  a  field  deployment  on  several  long 
haul  class-8  trucks  was  used  to  generate  the  necessary  EIS 
and  laboratory  analytical  data  for  this  effort.  Across  these 
installations,  online  oil  condition  monitoring  devices 
collected  data  continuously  for  several  months,  resulting  in 
a  dataset  which  spanned  more  than  ten  oil  changes. 

Throughout  most  of  the  test  period,  oil  samples  were  taken 
from  the  vehicles  and  sent  to  a  third  party  laboratory  for 
analysis.  Three  of  the  trucks  in  the  installation  were  selected 
for  inclusion  in  the  symbolic  regression  study  based  on  the 
quality  and  consistency  of  their  corresponding  data  sets. 

In  the  following  section,  the  output  of  the  models  generated 
through  the  application  of  the  previously  described  symbolic 
regression  techniques  are  presented  against  laboratory 
analytical  data  for  comparison. 

3.2.  Symbolic  Regression  Results 

Symbolic  regression  models  were  created  for  the  following 
laboratory  generated  analytical:  nitration,  TBN,  TAN,  Soot 
and  viscosity.  These  models  are  represented  by  closed  form 
mathematical  expressions  suitable  for  implementation  in 
embedded  hardware.  An  example  of  the  kind  of  expressions 
that  comprised  these  models  is  given  in  Eq.  (1)  below, 
wherein  Feature  1  is  one  of  the  electrochemical  features 
generated  by  the  oil  condition  monitor  and  X,  Y,  and  Z  are 
constants. 

TBN  est.  =  log  (Feature  1  -  XeY)  -  Z  (1) 

To  ensure  that  the  model  did  not  over-fit  the  data  the  model 
performance  metrics  were  computed  by  performing  cross- 
validation  using  50%  of  the  data.  For  each  laboratory 
analytical  a  single  model  was  created  based  on  data  from  all 
of  the  trucks  so  that  the  repeatability  of  the  model  across 
different  oil  condition  monitoring  hardware  and  different 
vehicles  could  be  evaluated. 

In  Figure  2  below,  the  model  based  Nitration  estimate  is 
represented  by  black  data  points.  The  vertical  lines  indicate 
when  an  oil  change  occurred.  As  expected,  the  nitration 
level  dropped  after  each  oil  change.  The  squares  indicate  the 
nitration  measurement  made  off-line  through  laboratory 
analysis  using  oil  samples  drawn  from  each  truck  during  the 
test.  The  squares  are  plotted  along  the  x-axis  according  to 
when  the  sample  was  drawn.  Also  note  that  more  online  EIS 
results  have  been  acquired  than  laboratory  analytical  data  at 
the  time  of  writing  of  this  paper.  This  is  especially  the  case 
for  truck  3,  for  which  oil  samples  have  not  yet  been  received 
for  operation  after  Jan  21st. 

The  model  performs  well  across  all  three  trucks  and  across 
the  entire  test.  The  only  laboratory  measurement  that  did  not 
line  up  well  with  model  results  was  the  first  sample  drawn 
in  March  on  truck  1.  It  is  more  likely  that  the  laboratory 
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measurement  is  wrong  rather  than  the  sensor  model  estimate 
since  it  is  unlikely  that  the  nitration  level  increased, 
decreased  and  increased  again  across  one  oil  cycle;  this  is 
not  typical  of  nitration  trending.  The  sensor  installed  on 
truck  2  also  underestimated  nitration  on  the  fifth  oil  cycle. 
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Figure  2;  Nitration  estimate  plotted  with  Lab  Nitration 
Measurements 


Statistical  results  are  summarized  in  Table  1  and  a 
histogram  of  the  differences  between  the  model  predicted 
values  and  the  ground  truth  is  depicted  in  Figure  3. 


Standard  Deviation  of  the 
Residual 

1.7934  (Abs) 

Mean  of  the  Residual 

-0.0380  (Abs) 

Table  1:  Nitration  Model  Performance  Metrics 

Nitration  estimate  error 

250  c - r - r - r - r - r - r - r - 


Residual 


Figure  3;  Histogram  capturing  error  between  the  model 
generated  Nitration  value  and  the  laboratory  results 


The  model  appears  to  perform  well  given  that  the  Nitration 
values  observed  in  the  laboratory  data  ranged  from  6  to  24 
(Abs).  That  is,  the  standard  deviation  of  the  modeling  error 
is  9.9%  of  the  range  of  the  laboratory  measurements. 
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Figure  4:  Soot  estimate  plotted  with  Lab  Soot 
Measurements 

Figure  4  shows  the  results  from  the  same  data  set  but 
applying  the  soot  estimation  model  and  comparing  with 
laboratory  soot  measurements. 

The  model  performs  well  across  all  three  trucks  and  across 
the  entire  test.  The  first  sample  drawn  in  March  on  truck  1 
continues  to  line  up  poorly  with  inline  data.  This  means  it  is 
most  likely  due  to  a  misrepresentative  oil  sample  being 
drawn  /  analyzed.  The  sensor  installed  on  truck  2  also 
underestimated  soot  content  on  the  fourth  oil  cycle. 


The  statistical  results  are  summarized  in  Table  2  and  a 
histogram  of  the  differences  between  the  model  predicted 
values  and  the  ground  truth  is  depicted  in  Figure  5. 


Standard  Deviation  of  the 
Residual 

0.0722(%) 

Mean  of  the  Residual 

-0.1373  (%) 

Table  2:  Soot  Model  Performance  Metrics 

The  model  appears  to  perform  well  given  that  the  Soot 
values  observed  in  the  laboratory  data  ranged  from  0.5  to  2 
(%).  That  is,  the  standard  deviation  of  the  modeling  error  is 
4.5%  of  the  range  of  the  laboratory  measurements.  Based  on 
this  observation,  the  soot  estimation  model  was  the  highest 
performer  among  the  5  models  calculated. 

Figure  6  shows  the  results  from  the  same  data  set  but 
applying  the  TBN  estimation  model  and  comparing  them 
with  laboratory  TBN  measurements. 
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Soot  estimate  error  TBN  estimate  error 


Figure  5:  Histogram  capturing  error  between  the  model 
generated  Soot  value  and  the  laboratory  results 
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Figure  6:  TBN  estimate  plotted  with  Lab  TBN 
Measurements 

The  model  performs  well  across  all  three  trucks  and  across 
the  entire  test.  There  are  a  greater  number  of  extreme 
outliers  than  the  Soot  and  Nitration  models  produce  but  still 
a  healthy  performance  within  one  standard  deviation  as  is 
shown  in  Table  3  below. 


Standard  Deviation  of  the 
Residual 

0.3220  (mgKOH/g) 

Mean  of  the  Residual 

-0.1940  (mgKOH/g) 

Table  3:  TBN  Model  Performance  Metrics 


Figure  7 :  Histogram  capturing  error  between  the  model 
generated  TBN  value  and  the  laboratory  results 

Given  that  the  TBN  values  observed  in  the  laboratory  data 
ranged  from  2.8  to  7.9  (mgKOH/g),  the  standard  deviation 
of  the  modeling  error  is  still  only  6.3%  of  the  range  of  the 
laboratory  measurements.  Therefore  while  there  are  a 
greater  number  of  residual  outliers  than  Nitration  for 
example,  it  still  out  performs  the  Nitration  model  the 
majority  of  the  time. 


Figure  8  shows  the  results  from  the  same  data  set  but 
applying  the  TAN  estimation  model  and  comparing  with 
laboratory  TAN  measurements. 
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Figure  8:  Soot  estimate  plotted  with  Lab.  Soot 
Measurements 

TAN  appears  to  be  the  worst  performer  of  the  five  models 
created.  However  the  model  does  appear  to  show  a 
correlation.  The  statistical  performance  shows  that  the 
model  does  not  perform  well  enough  to  be  relied  upon. 
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Standard  Deviation  of  the 
Residual 

1.2993  (mgKOH/g) 

Mean  of  the  Residual 

0.5829  (mgKOH/g) 

Table  4:  TAN  Model  Performance  Metrics 

Considering  that  the  laboratory  data  ranged  from  1.79  to 
4.22  (mgKOH/g),  the  standard  deviation  of  the  modeling 
error  is  over  50%  of  the  range  of  laboratory  measurements. 
In  other  words  the  confidence  bounds  of  the  estimate  extend 
to  over  half  the  range  of  typical  data. 

Finally,  Figure  10  shows  the  results  from  the  same  data  set 
but  applying  the  Viscosity  estimation  model  and  comparing 
with  laboratory  Viscosity  measurements. 


TAN  estimate  error 


Figure  9:  Histogram  capturing  error  between  the  model 
generated  TAN  value  and  the  laboratory  results 
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•  Viscosity  Model  Estimate  ■  Lab.  Viscosity  Measurement  - Oil  Change 


The  model  performs  well  across  all  three  trucks  with 
perhaps  a  slightly  weaker  performance  implemented  on 
truck  3  data.  The  statistical  results  are  summarized  in  Table 
5  and  Figure  1 1 . 


Standard  Deviation  of  the 
Residual 

0.1188 
(cSt  @100C) 

Mean  of  the  Residual 

-0.0673 
(cSt  @100C) 

Table  5:  Viscosity  Model  Performance  Metrics 

Viscosity  estimate  error 


Figure  1 1 :  Histogram  capturing  error  between  the  model 
generated  Viscosity  value  and  the  laboratory  results 

The  model  appears  to  perform  well  given  that  the  Viscosity 
values  observed  in  the  laboratory  data  ranged  from  14.4  to 
16.3  (mgKOH/g).  That  is,  the  standard  deviation  of  the 
modeling  error  is  6.3%  of  the  range  of  the  laboratory 
measurements. 

3.3.  RUL  Estimation  Plan 

The  data  shown  in  Figure  2,  Figure  4,  Figure  6,  Figure  8  and 
Figure  10  can  be  reconditioned  to  display  the  features  vs. 
hours  on  oil  by  identifying  top-ups  and  oil  changes  and 
adjusting  the  time  on  oil  accordingly.  The  resulting 
reconditioned  data  for  Nitration,  Soot,  TBN  and  Viscosity 
estimates  are  shown  from  Figure  12  through  Figure  15.  The 
bands  of  data  are  represented  by  a  family  of  feature  curves. 
The  traditional  oil  analysis  results  are  also  plotted  on  each 
plot  as  colored  squares.  Each  deterministic  curve,  after 
filtering  for  noise,  is  monotonically  increasing  and  can  be 
fitted  to  a  general  function  form,  nlh  order  or  exponential, 
depending  on  the  feature  type.  As  one  can  readily  see,  while 
there  is  significant  spread  of  the  values,  the  trend  on  each  is 
clear  and  a  regressive  model  can  be  used  to  predict  a  future 
threshold  exceedence  on  any  parameter. 


Figure  10:  Viscosity  estimate  plotted  with  Lab.  Viscosity 
Measurements 
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Hours  on  Oil 


Figure  12:  Hours  on  Oil  vs.  Nitration  Estimate 


Hours  on  Oil 

Figure  13:  Hours  on  Oil  vs.  Soot  Estimate 


Figure  14:  Hours  on  Oil  vs.  TBN  Estimate 


Figure  15:  Hours  on  Oil  vs.  Viscosity  Estimate 

Given  the  nature  of  the  data  and  in  order  to  better 
approximate  the  uncertainty  band  for  each  considered 
feature,  a  Monte  Carlo  method  was  chosen  to  estimate 
remaining  useful  life  probabilistic  outputs.  The  approach 
starts  with  identifying  the  core  parameter  drivers  of  the 
model  and  assigning  an  initial  distribution  to  each  variable. 
The  drivers  can  be  identified  by  performing  a  sensitivity 
analysis  to  quantify  the  influence  of  a  parameter  with 
respect  to  the  probabilistic  outputs,  such  as  the  remaining 
useful  life  distribution.  A  Monte  Carlo  simulation  is  then 
performed  and  consists  of  randomly  sampling  from  the 
initial  distributions  and  running  the  models  into  the  future 
over  a  predefined  operating  profile. 


Figure  16:  Monte  Carlo  Probabilistic  Method 

Figure  16  is  an  illustration  of  the  approach.  By  sampling  the 
initial  parameter  distribution,  a  family  of  model  curves  is 
generated  and  can  be  used  to  calculate  vertical  or  horizontal 
slice  predictions.  A  horizontal  slice  is  generally  taken  at  the 
critical  damage  level  and  will  generate  a  distribution  on  time 
to  critical  damage  which  also  represents  the  remaining 
useful  life  probabilistic  output.  A  vertical  slice  is  taken  at 
any  point  in  time  and  represents  a  distribution  on  predicted 
damage  at  specified  time  t. 
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It  was  also  determined  to  limit  the  number  of  input 
parameter  distributions  to  three  or  less.  The  more 
distributions  are  being  sampled  from,  the  more  simulations 
are  needed  to  obtain  a  better  approximation  on  the 
uncertainty  spread.  Different  sampling  methods,  such  as 
importance  sampling,  can  be  applied  to  reduce  the 
simulation  time  and  still  output  a  reasonable  approximation 
of  the  spread. 

One  of  the  advantages  of  this  approach  is  the  ability  to 
optionally  update  the  initial  input  distributions  given  ground 
truth  information.  The  underlying  assumption  is  that  if  the 
module  has  access  to  accurate  oil  analyticals,  these  results 
can  be  used  to  update  the  initial  distributions.  By  producing 
more  accurate  initial  conditions  for  the  prognosis  model,  the 
system  is  capable  of  improving  the  subsequent  prognosis 
results. 


The  RUL  determination  is  a  direct  adaptation  of  the  authors 
prior  work  in  health-based  prognostics.  The  prior 
demonstrated  method  uses  Particle  Filters  to  perform  feature 
trend  predictions  (Zhang  et  al.,  2008).  Particle  filtering  is  an 
application  of  Bayesian  state  estimation  that  calculates  an  a 
posteriori  probability  density  function  (PDF)  of  a  state  of  a 
system  based  on  a  priori  observations  or  measurements.  If 
the  calculation  of  the  future  state  of  the  system  is  extended 
in  multiple  steps  with  the  use  of  a  model,  the  particle 
filtering  algorithm  can  perform  long  term  predictions.  In  this 
case,  the  system  observations  are  initially  used  to  build  a 
PDF  of  the  “present”  or  “current”  system  condition,  as 
illustrated  conceptually  in  Figure  17. 


Diagnostic  horizon  <  *  ►  Prediction  horizon 


Just  as  with  the  initial  state,  future  states  of  the  system  can 
be  represented  by  PDFs.  Once  the  progression  of  the  system 
state  has  been  determined,  the  algorithm  can  be  used  to 
predict  the  time  required  for  the  system  to  reach  a  condition 
of  interest,  such  as  a  need  for  maintenance.  The  condition 
predicted  is  represented  by  a  “prediction  threshold”  line. 
Because  there  is  uncertainty  in  the  future  system  states  (as 
represented  by  the  different  state  progression  curves),  there 
is  also  uncertainty  in  the  predicted  time  to  reach  the 
threshold.  This  uncertainty  in  time  is  represented  also  by  a 
PDF,  referred  to  as  the  “time-to-threshold”  (TTT)  PDF.  The 
definition  of  prognostic  confidence  is  tied  to  how  the  area  of 
the  TTT  PDF  is  divided.  To  determine  the  minimum  time 
remaining  to  reach  the  prediction  threshold,  called  the  “just- 
in-time”  point,  a  confidence  specification  is  required.  Figure 
19  illustrates  how  a  95%  prediction  confidence  is  used  to 
determine  the  just-in-time  point.  This  approach  has  been 
successfully  used  in  a  range  of  different  mechanical  and 
electrical  prognosis  problems. 


Figure  17.  Determination  of  the  state  of  a  system  as  a  PDF 
based  on  feature  values 

This  PDF  is  then  sampled  into  “particles”  representative  of 
potential  system  states  with  individual  weights.  Using  the 
model,  the  prognostic  algorithm  simulates  the  progression 
of  the  weights  in  time  to  do  a  prediction  of  possible  future 
system  states,  as  illustrated  in  Figure  18. 


Figure  19.  Determination  of  the  prediction  time  to  reach  a 
prognostic  threshold  with  a  given  prognostic  confidence  (the 
inlay  box  provides  an  example  using  95%  confidence) 

4.  Conclusion 

Of  the  five  models  created,  all  but  one  performed  well 
enough  for  embedded  implementation  in  a  next  generation 
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online  oil  condition  monitor,  with  the  highest  performing 
estimate  being  soot  estimation,  and  TBN  and  Viscosity 
models  a  close  second. 

It  should  be  noted  however  that  the  laboratory  data  ranges 
mentioned  for  each  analytical  also  represent  boundaries 
within  which  the  model  can  be  implemented.  If,  for 
example,  soot  content  extends  above  2%,  the  model  can  no 
longer  be  trusted  to  perform  with  the  stated  accuracy.  Once 
data  is  acquired  outside  the  range  of  this  data  set,  the  models 
can  be  matured  to  handle  the  increased  range  and  the 
performance  of  the  model  will  have  to  be  reassessed. 

Another  limitation  of  the  model  is  the  singular  oil  type  in 
use  during  this  field  trial.  To  remove  potential  installation  or 
vehicle  specific  artifacts  contained  in  the  models  they  will 
need  to  be  verified  against  more  diverse  data  sets. 

Lastly  the  approach  and  framework  has  been  offered  to 
extend  this  regressive  analysis  approach  to  perform  real¬ 
time  prediction  of  oil  RUL.  Several  specific  methods  were 
offered  to  handle  the  uncertainty  and  also  produce  a  useful 
prediction  in  automated  software.  Realization  of  this 
technology  will  not  only  allow  for  improved  equipment 
protection  and  enhance  the  underlying  oil-wetted 
component  effective  reliability  with  its  ability  to  look  for 
contaminants  and  aging/wear  out  mechanisms,  but  it  will 
also  allow  both  oil  sampling/lab  tests  and  oil  changes  to  be 
performed  on  a  predictive  condition-based  schedule.  Thus, 
this  technology  has  the  ability  to  provide  significant  return 
of  value  to  the  operator  and  maintainer  as  well  as  provide 
environmental/green  movement  benefits  with  the  reduction 
in  oil  usage  and  subsequent  disposal. 
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